Greg Yang | Large N Limits: Random Matrices & Neural Networks

Update: 2023-01-04

Description

Greg Yang is a mathematician and AI researcher at Microsoft Research who for the past several years has done incredibly original theoretical work in the understanding of large artificial neural networks. Greg received his bachelors in mathematics from Harvard University in 2018 and while there won the Hoopes prize for best undergraduate thesis. He also received an Honorable Mention for the Morgan Prize for Outstanding Research in Mathematics by an Undergraduate Student in 2018 and was an invited speaker at the International Congress of Chinese Mathematicians in 2019.

In this episode, we get a sample of Greg's work, which goes under the name "Tensor Programs" and currently spans five highly technical papers. The route chosen to compress Tensor Programs into the scope of a conversational video is to place its main concepts under the umbrella of one larger, central, and time-tested idea: that of taking a large N limit. This occurs most famously in the Law of Large Numbers and the Central Limit Theorem, which then play a fundamental role in the branch of mathematics known as Random Matrix Theory (RMT). We review this foundational material and then show how Tensor Programs (TP) generalizes this classical work, offering new proofs of RMT. We conclude with the applications of Tensor Programs to a (rare!) rigorous theory of neural networks.

Patreon: https://www.patreon.com/timothynguyen

Part I. Introduction

00:00:00 : Biography

00:02:45 : Harvard hiatus 1: Becoming a DJ

00:07:40 : I really want to make AGI happen (back in 2012)

00:09:09 : Impressions of Harvard math

00:17:33 : Harvard hiatus 2: Math autodidact

00:22:05 : Friendship with Shing-Tung Yau

00:24:06 : Landing a job at Microsoft Research: Two Fields Medalists are all you need

00:26:13 : Technical intro: The Big Picture

00:28:12 : Whiteboard outline

Part II. Classical Probability Theory

00:37:03 : Law of Large Numbers

00:45:23 : Tensor Programs Preview

00:47:26 : Central Limit Theorem

00:56:55 : Proof of CLT: Moment method

1:00:20 : Moment method explicit computations

Part III. Random Matrix Theory

1:12:46 : Setup

1:16:55 : Moment method for RMT

1:21:21 : Wigner semicircle law

Part IV. Tensor Programs

1:31:03 : Segue using RMT

1:44:22 : TP punchline for RMT

1:46:22 : The Master Theorem (the key result of TP)

1:55:04 : Corollary: Reproof of RMT results

1:56:52 : General definition of a tensor program

Part V. Neural Networks and Machine Learning

2:09:05 : Feed forward neural network (3 layers) example

2:19:16 : Neural network Gaussian Process

2:23:59 : Many distinct large N limits for neural networks

2:27:24 : abc parametrizations (Note: "a" is absorbed into "c" here): variance and learning rate scalings

2:36:54 : Geometry of space of abc parametrizations

2:39:41 : Kernel regime

2:41:32 : Neural tangent kernel

2:43:35 : (No) feature learning

2:48:42 : Maximal feature learning

2:52:33 : Current problems with deep learning

2:55:02 : Hyperparameter transfer (muP)

3:00:31 : Wrap up

Further Reading:

Tensor Programs I, II, III, IV, V by Greg Yang and coauthors.

Twitter: @iamtimnguyen

Webpage: http://www.timothynguyen.org

Comments

In Channel

Justin Clarke-Doane | Mathematics, Reality, and Morality

2024-12-0602:34:12

Jay McClelland | Neural Networks: Artificial and Biological

2024-10-0202:59:15

Michael Freedman | A Fields Medalist Panorama

2024-07-1902:52:45

Marcus Hutter | Universal Artificial Intelligence and Solomonoff Induction

2024-05-1003:01:55

Richard Borcherds | Monstrous Moonshine: From Group Theory to String Theory

2024-02-0202:05:15

Announcements for 2024 and a Message to Viewers

2024-01-0901:13

Tim Maudlin | Bell’s Theorem and Beyond: Nobody Understands Quantum Mechanics

2023-12-0102:41:51

Antonio Padilla | Fantastic Numbers, Naturalness, and Anthropics in Physics

2023-09-2702:34:13

Boaz Barak | Cryptography: The Art of Mathematical Secrecy

2023-08-0202:33:29

Sean Carroll | The Many Worlds Interpretation & Emergent Spacetime

2023-06-1402:12:40

Daniel Schroeder | Introduction to Thermal Physics

2023-05-0201:33:14

Ethan Siegel | Demystifying Dark Matter

2023-03-2101:49:00

Alex Kontorovich | Circle Packings and Their Hidden Treasures

2023-02-1502:20:02

Greg Yang | Large N Limits: Random Matrices & Neural Networks

2023-01-0403:01:27

Scott Aaronson | Quantum Computing: Dismantling the Hype

2022-11-2203:05:08

Grant Sanderson (3Blue1Brown) | Unsolvability of the Quintic

2022-10-1302:19:33

John Baez | The Algebra of Grand Unified Theories

2022-09-0702:59:49

Tai-Danae Bradley | Category Theory and Language Models

2022-08-2202:25:17

John Urschel | Tackling Graph Theory

2022-08-2202:13:25

Richard Easther | The Big Bang, Inflation, and Gravitational Waves

2022-08-2002:32:00

00:00

Greg Yang | Large N Limits: Random Matrices & Neural Networks

#box-pro-ellipsis-176685633889947{-webkit-line-clamp:2;}Greg Yang | Large N Limits: Random Matrices & Neural Networks

Greg Yang | Large N Limits: Random Matrices & Neural Networks

Timothy Nguyen

Greg Yang | Large N Limits: Random Matrices & Neural Networks